Hydrogen in macromolecular models

Approximately 50% of the atoms in a protein are hydrogen. However, hydrogen atoms are absent from most molecular models. Most crystals do not have sufficient resolution (1.0 Ångstroms or better is needed) to determine the positions of hydrogen atoms. It is easy to add hydrogens to macromolecular models, but the results are only as good as the molecular models themselves.

Absence of Hydrogen Atoms in Most Macromolecular Models
Hydrogen atoms are absent from most molecular models in Proteopedia, which come mostly from the Protein Data Bank. This is because most macromolecular crystals do not have sufficient resolution to determine the positions of hydrogen atoms. However it is easy to add hydrogen atoms, and in fact it is a good idea, because it helps to correct and validate the molecular model.

Although their positions are not well defined empirically in the electron density maps from typical macromolecular crystals, sometimes hydrogens are added to X-ray crystallographic models before they are deposited in the Protein Data Bank. This is the choice of the authors of the PDB file. Hydrogens are usually present in PDB files resulting from NMR analysis, and usually present in theoretical models.

Approximately 50% of Protein Atoms, and approximately 35% of Nucleic Acid Atoms, are Hydrogen
In proteins, the average number of hydrogens per non-hydrogen atom, weighted to take into account the frequencies of amino acids, is 1.01. Thus, hydrogens are ~50% of all atoms in protein. Nucleic acids have fewer, ~35%.

To determine the percentage of atoms that are hydrogen in a model in Proteopedia, click on the word Jmol in the lower right corner of the rotatable molecular scene, and then on Console. In the lower box of the Console window that opens, enter "select hydrogen" and note the atom count in the report in the upper box. Then do the same for "select not hydrogen".

To visualize hydrogen atoms, use the FirstGlance link in the Resources section beneath the molecular scene. Once the model is displayed in FirstGlance in Jmol, click on Vines. Change the background to black with the background toggle button. Now click on Vines. In the help panel for Vines, check More detail. Hydrogen atoms are white. To hide and then show them, check Hide hydrogens, then uncheck it.

The value 1.01, for the average number of protein hydrogens per non-hydrogen protein atom, was calculated from the values for each amino acid, weighted by average frequencies of amino acids. The frequencies employed are based on 1,021 unrelated proteins of known sequence, tabulated on page 5 in Creighton (1993).

Empirically-Positioned Hydrogens in High-Resolution Crystallographic Models
High resolution protein crystallography (1.2 Ångstroms or better) can assign some hydrogen positions empirically from the electron density map, and very high resolution crystals (1.0 Ångstroms or better) can assign the positions of most hydrogens.


 * Example: The X-ray model of a tyrosine kinase SH2 domain 1lkk at 1.0 Angstrom resolution contains 901 hydrogens and 920 non-hydrogen protein atoms (ratio 0.98, 49%), so approximately all of the hydrogens actually present are assigned positions.
 * Example: The X-ray model of a domain from CD11a 1lfa at 1.8 Angstrom resolution contains 639 protein hydrogens and 2,939 non-hydrogen protein atoms (ratio 0.22, 17.9%). Only the polar hydrogens are present in the model. Since they could not be resolved at the resolution of this crystal, they were supplied by theoretical modeling after the heavy atoms were positioned according to the electron density map. All 312 waters are modeled as H2O.

Theoretically-Positioned Hydrogens in Average-Resolution Crystallographic Models
As explained above, most macromolecular crystals do not provide high enough resolution to detect hydrogen positions empirically. The median resolution of models in the Protein Data Bank is 2.0 &Aring;.


 * Example: No hydrogens: The X-ray model in PDB file 1hho for oxyhemoglobin (2.1 A resolution) contains no hydrogens.


 * Example: Some hydrogens from theory: The X-ray file 1lfa (1.8 A resolution; an integrin adhesion protein domain) contains 312 waters each with 2 hydrogens (so 624 water hydrogens), plus 639 protein hydrogens for 2,939 non-hydrogen protein atoms, which account for only 22% (639/2,939) of the hydrogens actually present in this protein. The protein hydrogens consist of one hydrogen on each backbone nitrogen (three hydrogens/amino terminal nitrogen), and hydrogens on sidechain oxygens or nitrogens in Ser, Thr, Tyr, Lys, Arg, His, Asn, and Gln. None of the hydrogens covalently bonded to carbons are present. The hydrogens which are present are required for the molecular dynamics stages of refinement of the X-ray model in the popular crystallographic refinement program X-PLOR; some authors strip them out before submitting a PDB file and others leave them in. The Protein Data Bank accepts X-ray models either way, according to the preference of the depositor.


 * Example: All hydrogens from theory:

Hydrogens in NMR Models
NMR methods also determine some hydrogen positions. Typically all hydrogens are modeled in before the molecule is folded to fit the NMR interatomic distance restraints; hence, all hydrogens are usually present in NMR models submitted to the PDB.


 * Example: The calmodulin ensemble of 25 NMR models 1cfc contains 1096 protein hydrogens and 1166 non-hydrogen protein atoms per model (ratio 0.94, 48.5%), thereby assigning positions for approximately all of the hydrogens actually present.
 * Example: The lac repressor:DNA complex ensemble of 3 NMR models 1lcd contains 294 protein hydrogens and 1197 non-hydrogen protein atoms per model (ratio 0.25, 19.7%). Only the polar protein hydrogens are present in this model. There are 141 hydrogens in the DNA, and 1,335 non-hydrogen DNA atoms (9.6%). Only the Watson-Crick and terminal deoxyribose hydrogens are present. All 138 water molecules are modeled as H2O. Individual models contain 243, 235, and 233 hydrogens. (I did not determine the basis for these differences.)

Adding Hydrogens From Theory
It is easy to add hydrogens to macromolecular models (PDB files) using the highly-reliable free servers listed below. Beware that the results are only as good as the molecular models themselves. Uncertainties in the positions of non-hydrogen atoms will, of course, produce inaccurate positions for hydrogen atoms. In fact, the quality of the molecular model can be judged in part from how well the hydrogens fit into the spaces between the non-hydrogen atoms. This degree of fit is quantitated in the overall clash score reported by the first method below, Molprobity.


 * Use the Richardson Lab's easy and very powerful MolProbity: All-Atom Contact Analysis server. Hydrogens are added to both protein and nucleic acids (but not to water), and you can save the resulting PDB file. This server has the advantage that you also get a powerful analysis of the quality of the model, including which Gln/Asn/His residues should have their sidechains flipped, an overall clash score, etc. You can save a model with the recommended sidechains flipped. Also you can visualize clashes anywhere in the model, including with the sidechains flipped or not flipped.


 * Use the Vriend Lab's WHATIF WWW Interface. Hydrogens are added to both protein and nucleic acids and also to water.
 * Under Classes (at left) click "Hydrogen (bonds)".
 * Select "Add protons to the structure".
 * Enter your PDB ID or upload a coordinate file.
 * After the results appear, click on the pdb link to receive the coordinate file containing added hydrogens.

PDB files that you save from either of these methods, can, for example, be uploaded for visualization in FirstGlance in Jmol.

1d66 is an early (1992) modest resolution (2.7 &Aring;) crystallographic model containing protein, DNA and 51 water oxygens. MolProbity reports its clashscore as 11.7, 65th percentile. The model deposited in the PDB contains no hydrogen atoms. The results from the above two servers:

1d66: WHATIF protonates the sulfurs in 12 cysteines that are coordinating 4 cadmium ions, and the 2 N terminal nitrogens (6 H atoms), while MolProbity does not. MolProbity protonates the terminal hydroxyls on the 2 DNA chains, while WHATIF does not. All hydrogens added by MolProbity appeared to be in reasonable geometries, while some of those added by WHATIF were not.

Content Attribution
Most of the original content in this article was adapted, with permission, from documentation written earlier by User:Eric Martz, for several locations in Protein Explorer: Water, Hydrogens in PDB files, and Hydrogen in the Help/Index/Glossary.

Thanks to John Badger for key contributions.